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Abstract 

The multiple measurement vector problem (MMV) is a generalization of the compressed sensing 
problem that addresses the recovery of a set of jointly sparse signal vectors. One of the important 
contributions of this paper is to reveal that the seemingly least related state-of-art MMV joint sparse 
recovery algorithms - M-SBL (multiple sparse Bayesian learning) and subspace-based hybrid greedy 
algorithms - have a very important link. More specifically, we show that replacing the logdet(-) term 
in M-SBL by a rank proxy that exploits the spark reduction property discovered in subspace-based joint 
sparse recovery algorithms, provides significant improvements. In particular, if we use the Schatten-p 
quasi-norm as the corresponding rank proxy, the global minimiser of the proposed algorithm becomes 
identical to the true solution as p —^ 0. Furthermore, under the same regularity conditions, we show 
that the convergence to a local minimiser is guaranteed using an alternating minimization algorithm 
that has closed form expressions for each of the minimization steps, which are convex. Numerical 
simulations under a variety of scenarios in terms of SNR, and condition number of the signal amplitude 
matrix demonstrate that the proposed algorithm consistently outperforms M-SBL and other state-of-the 
art algorithms. 
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I. Introduction 

The multiple measurement vector problem (MMV) is a generalization of the compressed sensing 
problem, which addresses the recovery of a set of sparse signal vectors that share a common support 
HI, 0, 0, a, 0, 0. In the MMV model, let m and N denote the number of sensor elements 
and snapshots, respectively; and n > m denote the length of the signal vectors. Then, for a given 
noisy observation matrix Y = [yi,--- ,y7v] G ^ sensing matrix A G the multiple 

measurement vector (MMV) problem can be formulated as: 

minimize ||V||o (1) 

subject to IIV — ^V||i? < <5, 

where Xj G M" is the j-th signal, X = [xi, • • • ,X7v] G x* is the i-th row of X, and ||X||o = 

|suppV|, where suppX = {1 < i < n : x* 7 ^ 0} is the set of indices of nonzero rows in X. The 
Frobenius norm is used to measure the discrepancy between the data and the model. Classically, pursuit 
algorithms such as alternating minimization algorithm (AM) and MUSIC (multiple signal classification) 
algorithm Q, S-OMP (simultaneous orthogonal matching pursuit) 0,0, M-FOCUSS 0, randomized 
algorithms such as REduce MMV and BOost (ReMBo) i4l . and model-based compressive sensing using 
block-sparsity 0, ifTOl have been applied to the MMV problem. 

An algebraic bound for the recoverable sparisity level has been theoretically studied by Feng and 
Bresler 0, and by Chen and Huo 0 for noiseless measurement Y. More specifically, if A G 
satisfies AX = Y and 

II ^11 ^ spark(A) -f rank(V) - 1 
ll-^llo< - 2 -’ 

where spark(A) denofes fhe smallesf number of linearly dependenf columns of A, fhen X is fhe unique 
solufion of ([T]). This indicates that the recoverable sparsity level may increase with an increasing number 
of measurement vectors. Indeed, for noiseless measurement, a MUSIC algorithm by Feng and Bresler 0 
is shown to achieve the performance limit when the measurement matrix is full rank. However, except 
for MUSIC in full rank cases, the performance of the aforementioned classical MMV algorithms is not 
generally satisfactory, falling far short of 0 even for the noiseless case, when only a finite number of 
snapshots are available. 

In a noisy environment, Obozinski et al showed that a near optimal sampling rate reduction up 
to rank(F) can be achieved using I 1 /I 2 mixed norm penalty iflTI . A similar gain was observed in 
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computationally inexpensive greedy approaches such as compressive MUSIC (CS-MUSIC) lU and 
suhspace augmented MUSIC (SA-MUSIC) Q. More specifically, Kim et al d and Lee et al 0 
independently showed that a class of hybrid greedy algorithms that combine greedy steps with a so 
called generalized MUSIC subspace criterion d, or equivalently, with subspace augmentation 0, can 
reduce the required number of measurements by up to rank(y) in noisy environment. Furthermore, using 
a large system MMV model, Kim et al further showed that for an i.i.d. Gaussian sensing matrix, their 
algorithm can asymptotically achieve the algebraic performance limit when rank(y) increases with a 
particular scaling law d- Lee et al 0 also showed that MUSIC can do this in the noisy case and full 
rank, non-asymptotically with finite data, and for realistic Fourier sensing matrices. 

While the aforementioned mixed norm approach and subspace based greedy approaches provide 
theoretical performance guarantees, there also exist a very different class of powerful MMV algorithms 
that are based on empirical Bayesian and Automatic Relevance Determination (ARD) principles from 
machine learning. Among these, the so-called multiple sparse Bayesian learning (M-SBL) algorithm 
is best known |[T^ . Even though M-SBL is more computationally expensive than greedy algorithms 
such as CS-MUSIC or SA-MUSIC, empirical results show that M-SBL is quite robust to noise and to 
unfavorable restricted isometry property constant (RIC) of the sensing matrix ifT^ . Moreover, M-SBL 
is more competitive than mixed norm approaches. Since Bayesian approaches are very different from 
classical compressed sensing, such high performance appears mysterious at first glance. However, a recent 
breakthrough by Wipf et al unveiled that M-SBL can be converted to a standard compressed sensing 
framework with an additional logdet(') (log determinant) penalty - a non-separable sparsity inducing 
prior mi. The presence of the non-separable penalty term is so powerful that M-SBL performs almost 
as well as MUSIC. However, the guarantee only applies to the full row rank case of X with noise- 
free measurement vectors llT5l . lIT^ . However, despite its excellent performance, compared to the mixed 
norm approaches or subspace greedy algorithms, other than the work by Wipf et al |[T4l . the fundamental 
theoretical analysis of M-SBL has been limited. 

Therefore, one of the main goals of this paper is to continue the effort by Wipf et al lfT4l and 
analyze the origin of the high performance of M-SBL, as well as to investigate its limitations. One of 
the important contributions of this paper is to show that the seemingly least related algorithms - M-SBL 
and subspace-based hybrid greedy algorithms - have a very important link. More specifically, we show 
fhaf fhe logdet(') ferm in M-SBL is a proxy for fhe rank of a partial sensing matrix corresponding to 
the true support. We then show that minimising the rank that was discovered in subspace-based hybrid 
greedy algorithm to exploit the spark reduction property of MMV can indeed provide a true solution for 
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the MMV problem. Accordingly, replacing logdet(-) term in M-SBL by a Schatten-p quasi-norm rank 
proxy provides significant performance improvements. 

The resulting new algorithm is no longer Bayesian due to the use of a deterministic penalty based on a 
geometric argument, so we call the new algorithm subspace-penalized sparse learning (SPL) by excluding 
term “Bayesian”. We show that as p ^ 0 in the Schatten p-norm rank proxy, the global minimizers of 
the SPL cost function are identical to those of the original Iq minimization problem. Furthermore, we 
show that SPL can be easily implemented as an alternating minimization approach. 

Using numerical simulations, we demonstrate that compared to the current state-of-art MMV algorithms 
such as mixed norm approaches, M-SBL, CS-MUSIC/SA-MUSIC, and sequential CS-MUSIC IITTI . SPL 
provides superior recovery performance. Moreover, the results show that SPL is very robust to noise, and 
to the condition number of the unknown signal matrix. 

A. Notation 

Throughout the paper, x* and Xj correspond to the i-th row and the j-th column of matrix X, 
respectively. The (i,y) element of X is represented by Xij. When S is an index set, X^, and Ag 
correspond to a submatrix collecting corresponding rows of X and columns of A, respectively. For a 
matrix A, Tr(A) is the trace of a matrix A, A* is its adjoint, A^ denotes the Penrose-Moor psuedo- 
inverse, |A| refers the determinant, R{A) denotes the range space of A, and Pa (or Pr[a)) P^ (or 
^R{A)^ are the projection on the range space and its orthogonal complement, respectively. The vector e* 
denotes an elementary unit vector whose i-th element is 1 , and / denotes an identity matrix. 

A sensing matrix A € is said to have a /c-restricted isometry property (RIP) if there exist left 

and right RIP constants 0 < 5^, 5^ < 1 such that 

for all X € M"' such that ||x llo < k. A single RIP constant 6^ = max{(5^,(5^} is often referred to as the 
RIP constant. 

IT M-SBL: A Review 

A. Algorithm Description 

Under appropriate assumptions of noise and signal Gaussian statistics, one can show that M-SBL 
minimizes the following cost function in a so-called 7 space ifT^ : 


£^( 7 ) = Tr(S;iyy*) +iVlog|S^ 


( 3 ) 
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where 


T.y = XI + AT A* , r = diag( 7 ) . (4) 

With an estimate of F, which typically has a nearly sparse diagonal and may he thresholded to he exactly 
sparse, the solution of M-SBL is given hy 

X = ry4*(A/ + ATA*)-^Y . (5) 

One of the most important contributions hy Wipf is that the minimization problem of the cost function 
(l3]l can be equivalently represented as the following standard sparse recovery framework llT4l : 

imn£^(X), C^{X) = ||y - AX\\l + Xgmsbi{X) (6) 

where gmsui^) is a penalty given by 

gmsbl (X) = min G{X ; 7 ) (7) 

7>0 

where 

G(X; 7 ) = Tr + X log | A/ + AT A* \ . ( 8 ) 

Wipf et al IIT 4 I gave a heuristic argument showing that gmsbi corresponds to a non-separable sparsity pro¬ 
moting penalty, and proposed the following alternating minimization approach to solve the minimization 
problem 

1) Step 1: Minimization with respect to X: For a given estimate 7 ^*) at the f-th iteration, we can find 
a closed form solution for X in 

xW = rWx*(A/ + = diag(7W). 

For large scale problems, this can be computed using a standard conjugate gradient algorithm with an 
appropriate preconditioner. 

2) Step 2: Minimization with respect to In this step, for a given X^^'^ we need to solve the following 

7 ( 4 + 1 ) ^ argminG(xW, 7 ) 

7>0 


minimization problem: 


where 
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= Tr 


XW) +Xlog|Ey 


(9) 


Wipf et al lfT4l find the solution to VG{X^^\'y) = 0. More specifically, the derivative with respect to 
each component is given hy 



aG(xW,7) 

d-fi 


( 10 ) 


since 



Setting the derivative to zero after fixing T := this observation leads to the following fixed point 
update of 7 : 



( 11 ) 


B. Role of the Non-separable Penalty in M-SBL 

In order to develop a new joint sparse recovery algorithm that improves on M-SBL, we provide here 
a new interpretation of the role of the regularization term in M-SBL. Note that due to the non-negativity 
constraint for 7 , a critical solution to the minimization problem in (IT]) should satisfy the following first 
order Karush-Kuhn-Tucker (KKT) necessary conditions lIThl : 




Hence, as A —>• 0, this leads to the following fixed point equation: 



11 IItIIo < using the matrix inversion lemma, we have 



Pas + ^ + Ts) ^5 ) 


( 12 ) 
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where S = supp( 7 ) denotes a nonzero support of 7 and Pa^ denotes the orthogonal projection on the 
span of the columns of A indexed hy S. Accordingly, 

Im a*(A/ + = a*(A^)*r^^A^ai = ^, i e S. 

Therefore, we have 

lim 7 i = ^||x*||^, ieS. (13) 

A-s-O iV 

Suhstituting (fT3l ) into (|7]) yields 

9msbi{X) = minTr(X*r-iX) +iVlog|A/ + ArA*| 

TV |5| + A^log|AI +ATA*! 

= iV||7||o + iVlog|A/ + ArA*| (14) 

Note that the first term in (fT4l) imposes row sparsity on X since 7 * = 0 for ||x®|| = 0 due to ([T3]) . 
Hence, the first term of M-SBL penalty is in fact ||Ai||o. Then, what is the meaning of the log | • | term 
? Wipf et al IIT 4 I showed that the superior performance of the M-SBL is owing to the non-separahility 
of the term log |A/ + ArA*| with respect to 7 , which can avoid many local minimizers. In addition to 
this interpretation, the following section shows another important geometric implications of the logdet(-) 
term. 


III. Sub Space-Penalized Sparse Learning 

A. Key Observation 

In this section, we provide another interpretation of the M-SBL penalty, which suggests a new algorithm 
called suhspace-penalized sparse learning (SPL) that overcomes the limitation of M-SBL. Note that for 
any matrix Z € with m < n, we have 

m 

log|ZZ* + AI| = J]log(a2(Z) + A), (15) 

i=l 

where ai{Z) denote the singular values of Z. Therefore, the logdet() function is a concave proxy for 
nonzero singular values, hence in the limit of A —0, ([TS]) acts as a proxy for rank(Z) iTTll . HSj]. This 
leads us to another interpretation: the penalty term in M-SBL is equivalent to 

lim gmsbi{X) = iV||7l|o + X Rprox(Ar2) 


( 16 ) 


where Rprox(-) dentoes a rank proxy. Thus, the penalty simultaneously imposes the row sparsity of X 
as well as the low rank of the matrix AT 2 . By inspection, the first sparsity penalty term in (fT^ is quite 
intuitive, hut it is not clear why Rank(Ar 2 ) needs to he minimized. 

In fact, the main contrihution of this paper is that we need to replace the second term, Rprox(Ar5), 
hy geometrically more intuitive rank proxy as follows: 

9 spl{X) = iV|| 7 ||o + N Rprox(QMr^) (17) 

where Q denotes a basis for the noise suhspace denoted as R{Q) = In the following, we will 

descrihe in detail how we arrive at the new rank penalty. 

B. Subspace Criteria 

A solution X for Y = AX that satisfies ||Ar||o < m is called a basic feasible solution (BPS) |[T4l . 
Among BFSs, a solution of the following Iq MMV problem is called maximally sparse solution: 

(PO) : min||A||o , subject to Y = AX. (18) 

To address (fT^ . subspace-based greedy algorithms such as CS-MUSIC O] and SA-MUSIC Q exploit 
the spark reduction principle or or an equivalent subspace criterion using an augmented signal subspace. 
More specifically, if r = rank(y) and k denotes the number of the non-zero rows, the algorithms first 
estimate k — r partial support index R-r, then the remaining r components of the support are found 
using the subspace criterion. 

One of the main contributions of this paper is that this two step approach is not necessary. Instead, 
for noiseless measurements, a direct minimization of the rank of Q* Aj with respect to index I, |I| > k, 
still guarantees to obtain the true support as shown in Theorem 13. II We believe that this is an extremely 
powerful result that provides an important clue to overcome the limitation of existing greedy subspace 
methods Ifll. Il6l . 

Theorem 3.1: Assume that A E X E Y E satisfy AX = Y, where ||A"||o = k, 

and the columns of Y are linearly independent. If A satisfies a RIP condition 0 < < 1, fhen 

we have 

k — r = min rank {Q* At) , 

\I\>k 


and 


suppX = arg min rank {Q*Ai). 
\i\>k 


Proof: Since min|/|=; rank((5*j4/) is a nondecreasing function of I, we may consider 
min|/|=fc rank(Q*^/). By the rank-nullity Theorem, dim(A^(Q*^/)) + dim(i?((5*^/)) = A: for / C 


{1, • • • ,n} with |I| = k. Furthermore, because N{Q*) = Y, 

N{Q*Ai) = {v e : A/v G R{Y)} (19) 

and because r < k, N{Aj) = {0} by the RIP condition 0 < 52k-r+i{^) < 1- Since N{Ai) = {0}, we 
have dim(A^((5*A/)) = dim(i?(y) n R{Ai)) so that 

dim(y(gM/)) = dim(i?(y) n R{Ai)) < dim(i2(y)) = r, (20) 

which also implies that iank{Q*Aj) > k — r for any |/| = k. Hence, denoting supp(y) = S, it is 
enough to show that 

rank(Q*^ 5 ) = fe — r (21) 

and 

rank(g*74/) > k — r, for |/| = k and I S. (22) 


First we will show that (|2T]) holds. Because N{As) = {0} and Y = AsX^, we have that rank(y) = 
rank(y'^) or dim(i2(X'^)) = r. Also, since Y = AsX^, by ([T9l) . we have R{X^) C N{Q*As). Hence 
dim(A^(Q*A 5 )) > r. On the other hand, by (l20l) . the dimension of N{Q*As) is at most r, which implies 
dim(iV(Q*A 5 )) = r, so that we have rank(g*A 5 ) = k — r. 

Then, (l22l) is the only a remaining part to prove. Suppose that we have an index set I C {1, • • • , n} such 
that |/| = k and rank(g*A/) = k — r. Then, it must hold that 

dim(y(QM/)) = dim(i?(y) n R{Ai)) = r = dimi?(y). 

It follows that i?(y) = R{Y) n R{Aj) C R{Aj). Then, for each column of Y, y* € R{Y) C R{Aj) 
so that there exists Zj G such that AjZi = y^. Then, A/[zi • • • z^] = [yi • • • y^] = y so that there 
is a X G such that Y = AX with suppX C I. 

Since rank(y) = r and (x*)^ = 0 for any i ^ S = suppX, it follows that X^ has rank r. Hence, 
because the row rank of a matrix equals its column rank, X^ must have r linearly independent rows. 
Therefore, there is a subset Z of suppX such that \Z\ = r, and the rows of X^ are linearly independent. 

Since X^ G for every i € Z there is a nonzero vector Wj G N{X^\^’‘^), so that we have 

||2fwj||o < k — r + I and i G supp(Xwj), since x*Wj 7 ^ 0 by the linearly independence of the rows of 
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. Since Y = AX = AX, we have A{X'Wi — Xwj) = 0. 

Now, because supp(X) C I, we have supp(Xwi) C supp(X) C I. Hence ||Xwj|| < |/| = k. It 
follows that 

||Xwj — Xwj||o < k — r + 1 + \I\ = 2k — r Y 1. 


Hence, by the RIP of A, we must have Xwj = Xwj. Since supp(Xw 4 ) C I, we also have supp(Xwj) = 
supp(Xwj) C I, which implies that i € supp(Xwi) C I. Since i can be any element in suppX, we 
have i G I for any i € suppX. It implies that suppX C / so that I = suppX since |/| = |suppX| = k. 
Hence, in order to satisfy Tank{Q*Aj) = k — r and |/| = k, we must have I = suppX. ■ 


C. The SPL Penalty 

Note that minimizing rank with respect to 7 is equivalent to finding the index set I that 

minimizes rank {Q*Ai). Hence, Theorem 13 .1 1 implies that minimizing rank(Q*Ar 2 ) under the constraint 
II 7 II 0 > k will find fhaf has non-zero values for indices corresponding fo suppX*, where ||3f*||o = k 
and Y = AX^ . This observation leads to the second term in the SPL penalty of (fTTl ) as a rank proxy to 
exploit this geometric finding. 

Moreover, rather than just using logdet(') as in M-SBL, in this paper, we use more general family of 
rank proxies that still satisfy our goals. Specifically, our rank proxy is based on Schatten-p quasi norm 
with 0 < p < 1 that includes the popular nuclear norm as a special case. For a matrix W G the 

Scatten p-norm proxy for the rank is defined as 

m 

Tt\W\p = Tr , (23) 

i=l 

which corresponds to the nuclear norm when p = 1. Following the derivation that leads to ([7]), we propose 
the following SPL penalty: 


9SPL {X) = min Gspl (24) 

7>0 

where 

Gspl{i,X) = Ti{X*T-^X)+N1y({Q*ATA*Q)-^'^ . (25) 

Using the proposed SPL penalty, we formulate the following noiseless SPL minimization problem: 

mingspLiX), subject to U = AX . (26) 


II 


Note that TV {{Q*ATA*Q)^j is a concave function with respect to its singular values, so we can find 
its convex conjugate: 


Qspl{i,X)^ min 


( 27 ) 


where <Sspl{i,X,'^) is given as 


®SPl(7, X, ^) = tv (X*r-iX) + Np (^Tr {{Q*ArA*Q)'^) - |Tr(^i) 


(28) 


for q such that l/(p/ 2 ) + ll{ql2) = 1 ; and So+ denotes the set of symmetric positive semi-definite 
matrices: 


So+ = {X G S : X 7 0 }. 


The relationship between (|2^ and (l25l) can he clearly understood hy minimizing (|2^ with respect to T'. 
Indeed, using c)TV(^T')/c)T' = A* and c)Tr('I')'?/^(9'I' = jT^ . we have 


T- = {Q*ArA*Q)^^ 


(29) 


and 



Here, {Q*ATA*Q)'‘/'^-^ should he understood as applying the power operation to the non-zero singular 
values of Q*ATA*Q while retaining zero singular values at zero. 


Notice that 05 pj;,( 7 , X, T') is a surrogate function that majorizes ^spl(X, 7 ). Although like (1251) . 
(| 2 ^ is not jointly convex with respect to the different variables, the reason to prefer (| 2 ^ over (|25]) is 
that (|28] ) is convex with respect to each of the variables 7 , X, and T' with the other held constant, and 
we can obtain a closed-form expression in each step of alternating minimization. Specifically, recall fhaf 
the SPL penalty is given by 


9spl{X) 


mm 

7>0,'I'gSo+ 


®5PL(7)7f, T') 


( 30 ) 


®SPl(7, TV, = Tr (X*r-^X) + Np (tt {{Q*ATA*Q)^) - ^Tr(T'i) 


where 
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Let S denotes the non-zero support set of X. Using the KKT condition with respect to 7 , we have 


d^spL{'y,x, 

d-fi 


X' 


i\\2 


IJ-i = 


Ti 


+ Nsi*Q^!Q*Sii — /Xj = 0, Vi 


IJ,ai = 0, /Uj > 0, 7* > 0, Vi 


(31) 

(32) 


which leads to 7 * = 0 for i ^ S, whereas for i € S' 


r)0ScT3T IIxMP 1 

= 0 = + N^i^*Q{Q*AVA*Q)^-Q*a.i . 

on n 

Hence, Tr = XTr (^{Q*ATA*Q)^^^'^ = XTr {{Q* AT A* Q)-^) and we have 

9spl{X) = 2XTr ((QMLX^Q)?)) = 2XTr|gMr^|P. 


This implies that at the KKT point, the SPL penalty has cost function values equivalent to the Schatten-p 
quasi-norm rank penalty for Q*AT^. 


IV. The SPL Algorithm 

A. Alternating Minimization Algorithm 

So far, we have analyzed the global minimizer for the noiseless SPL algorithm. For noisy measurement, 
we propose the following cost function: 

min ||y - AXfp + XgsPL{X) . (33) 

X 

By letting A —0, the solution of (1^ becomes a solution of (l26l) when rank(Ar^/^) = m since then 
the constraint is automatically satisfied as follows: 

lim AX(X) = lim ATA*(XI + ArAU“^y = Y 

A->0 A^O 

Similar equivalence can be hold for rank(Ar^/^) < m if Y € R{AT^/'^). Therefore, rather than dealing 
with Eqs. (|2^ and (l3^ separately, we use (l3^ and the limiting argument to discuss a noiseless SPL 
optimization problem. 

Then, using (l30l) . a noisy SPL formulation can be written as 

min (^(X, 7 ,'!') (34) 

X,7>0,'fGSo+ 
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where the augmented cost function is given hy 



While (^(X, 7 ,'!') is not convex for all these variables simultaneously due to the presence of the bi¬ 
convex terms Tr and Tr it is convex with respect to each variable X, 7 

and 'h separately. Indeed, this is a typical example of the d.c. algorithm (DCA) for the difference of 
convex functions programming Il 20 l . 121 > and the alternating minimization algorithm converges to a local 
minimizer or a critical point. 

Specifically, a critical solution should satisfy the following first order Karush-Kuhn-Tucker (KKT) 
necessary conditions ifThll : 


gg(X,7,^) 

dX 

9C(X,7,^) 


Np{Q*ArA*Q) - = 0 


2A*{Y - AX) + \T-^X = 0 


(37) 


(36) 


9^- 


dC{X,^,^) 


- 2-+ Na*Q^Q*ai - lH = 0, Vi 

7i 

0, lJ,i>0, 7* > 0, Vi 



(38) 


lUJi 


(39) 


This leads us to the following fixed point iterations: 


1) Minimization with respect to X: For a given estimate 7 V), ([36l) yields a closed form solution for 

XV+l); 


x(‘+i) = rWA*(A/ + ArWy4*)"^y, = diag(7W). 


2) Determination of^: For a given estimate 7 V), using (l37l ). we can find a closed-form solution for 


j_e_ ^ (gMrWA*Q)y^ . 

3) Estimation of For a given 'I'V) and X^^\ using Eqs. (l3^ and (l39l) . we have 



(40) 


Here, if / 0, we have the following update equation: 



(41) 


Note that the SPL updates appears similar to those of M-SBL except the 7 update by (|4TI) . which is 
now modified based on subspace geometry. This is the main ingredient for the performance improvement 
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of SPL over M-SBL. In the following, we further discuss several important properties of the SPL penalty. 


B. Properties of the SPL Penalty 


An interesting case occurs when = 0. In this case, based on (l40l ). we have the following 


two observations: 1) 00 when 7 ^ 0; and 2) 7 }* can be an arbitrary positive number C 

when = 0 since the equality in (|4^ is satisfied regardless of the choice of C. Therefore, we 

define fhe following 7 * updalq^: 



CX), 

C>0, 

wll^ II 

1* <3 <!-(*) Q. 


if aJ'QT'Wg^ai = 0 and Hx^^^ ^ g 
if a|QT'W( 5 *aj = 0 and ||xW*|p = 0 
if a^QT-Wg^ai / 0 


(42) 


Thanks to (l42l) . even if ||x(*)*|p becomes erroneously zero during the iterations, there is a possibility, when 
T'(*)g*aj = 0 , for to become nonzero; hence, the corresponding row of can become nonzero 
once 7 j turns into nonzero. Note that this is very different from M-SBL, since in (fTTl) the denominator 
term cannot be zero even under the most relaxed RIP constraint < 1, so the condition ||x(*)*|p = 0 
will set the corresponding 7 ^^^ to zero. Therefore, in M-SBL, once a row of is set to zero in error, 
it will stay zero for all subsequent iterations and the algorithm is unable to recover from this error. 

Second, it is important note that since l/(p/2) -|- l/(g/2) = 1, we have limg^oP = 1-2jq ~ 

so 


lim gspdX) = lim 2iVTr|g*Ar^ 1^ = 2iVRank(g*A/), (43) 

q ->-0 q ->-0 


where I denotes the index set of non-zero diagonal elements of T. Hence, in this case, the SPL algorithm 
with p —> 0 is the algorithm that directly minimizes the rank of Q*Ai. In this case, the corresponding 
update rule is given by 


x(‘+i) = r^^^A*{XI + AT^^'^A*)-^Y 


(i) 

% = 


2 -||y(*)*I |2 
n\\^ II 


a*g(g*ArWA*g)-ig*ai 


(44) 


The main technical challenge is, however, that forp = q = 0 the cost function (1351) is not well-defined. 
Therefore, the aforementioned interpretation of the SPL should be understood as an asymptotic result 
such that p and q approach zero, but are not exactly zero. 


*In a practical implementation, a tolerance around 0 and finites values for 7 ^^*^ 


have to be used. 
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Next, as a by product of Theorem 13.11 the SPL algorithm is computationally more efficient than M- 
SBL. Note that the computational bottleneck of M-SBL (or SPL) is due to the the inversion of A* 
(or Q*AT^^'^A*Q, respectively). Specifically, unlike the update step that can be done using the 
conjugate gradient (CG) algorithm, the matrix inversion cannot be performed using CG and usually is 
performed using the singular value decomposition (SVD). Now, note that the size of matrix Q*AT^^^ A*Q 
in SPL is {m — r)x {m — r) compared to mxm for AT^^'> A* , which reduces the cost of matrix inversion 
for SPL compared to M-SBL. In particular, for the case of MUSIC where m = k + 1 and r = k, matrix 
inversion is not necessary for SPL whereas M-SBL still requires the mxm matrix inversion. 

Finally, note that the hyper-parameter 7 is closely related to spectral estimation. For example, for the 
case of MUSIC where m = k + 1 and r = k, the term {Q* AT^^'^ A*Q) in (l44l) reduces a scalar and we 
have 




_ II _ 

Q{Q*Ar(i)A*Q)-^Q* 



(45) 


1 iixWi 

= — , = X — (46) 

^a*QQ*ai ^yN{Q*AT(t)A*Q) 

where the first term is the MUSIC spectrum and the second term is related to the magnitude of the i-th 
row of Hence, in the case of full-row rank X (i.e., the MUSIC case), SPL can be regarded as an 
algorithm that initialises he non-zero support estimation using a spectral estimation technique, followed 
by alternating modification using the data fidelity matching criterion. 


V. NUMERICAL RESULTS 

In this section, we perform extensive numerical experiments to validate the proposed algorithm under 
various experimental conditions, and compare it with respect to existing joint sparse recovery algorithms. 
In particular, we are interested in the SPL algorithm in the asymptotic region of p —0 since it directly 
minimises the rank of Q*Ai. 

The elements of a sensing matrix A were generated from a Gaussian distribution with zero mean and 
variance of 1/m, and then each column of A was normalized to have an unit norm. An unknown signal 
X* with rank(X*) = r < k was generated using the same procedure as in || 6 l. Specifically, we randomly 
generated a support I, and then the corresponding nonzero signal components were obtained by 

Xi = UT.V , (47) 

where U € was set to random orthonormal columns, and S = diag([(Ti][^j^) is a diagonal matrix 
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whose i-th element is given by 

ai = T^, 0 < r < 1, (48) 

and V was generated using Gaussian random distribution with zero mean and variance of 1/N. 

After generating noiseless data, we added zero mean white Gaussian noise. We declared success if an 
estimated support from a certain algorithm was the same as a true suppX. 

As the proposed algorithm does not require a prior knowledge of the sparsity level, we need to define 
a stoping criterion. Here, the stopping criterion is defined by monitoring the normalized change in the 
variable 7 : 


- ^ iU 

2 

From our experiments, usually 20-30 iterations are required for SPL to converge. 

A. Local Minima Property 

We first perform experiments to confirm thaf SPL produces the true solution under milder conditions 
than M-SBL. To show this, using Ai-sparse signal Ai* generated by Wh with r = 1, we produced 
measurements Y = AX* such that r := rank(y) < k. Then, we initialized both algorithms with 
that satisfies the following: 

Y = AX^^\ ||X^°^||o = m, |suppX* nsuppX*^°^| = s , (49) 

where s < k — r, s = k — r and s > k — r, respectively. Note that the initialization corresponds to a local 
minimiser and we are interested in confirming that SPL can escape from the local minimizers thanks to 
the update in (l42h . Recall that it is difficult for M-SBL to avoid this type of local minimizers since X^^'i 
has zeros rows at the f-th row where i G S \ suppX^^^and the M-SBL update rule in (fTTI) cannot make 
the corresponding 7 ^ nonzero in the subsequent iterations. 

Figs. [Ila)-(c) illustrate the perfect recovery ratio from the initialization using SPL and M-SBL at 
various SNR conditions for (a) s = A: — r = 3, (b) s = 2 < /c — r, and (c) s = 5 > k — r, respectively. 
The results clearly demonstrate that SPL finds fhe global minimizer nearly perfectly, whereas M-SBL 
fails most of the time. This clearly confirms the theoretical advantages for SPL. 

B. Comparison with Other State-of-Art Algorithms 

To compare the proposed algorithm with various state-of-art joint sparse recovery methods, the recovery 
rates of various state-of-art joint sparse recovery algorithms such as MUSIC, S-OMP, SA-MUSIC, 


117W11 


IIt**) - T** 1)||2 
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(a) 


(b) 


(c) 


Fig. 1. Perfect recovery ratio from initialization using local minimizer that satisfy l l49b . The results are averaged after 500 
runs and the simulation parameters are: fc = 10, r = 7, n = 128 and N = 64. (a) s = fc — r = 3, (b) s = 2 < fc — r, and (c) 
s = 5 > k — r, respectively. 


sequential CS-MUSIC, M-SBL, and the I 1 /I 2 mixed norm approach are plotted in Fig. |2] along with 
those of SPL. Among the various implementation of mixed norm approaches, we used high performance 
SGPLl software If^ . which can he downloaded from http : (jwww.es.ubc.cajlabs jscljspgllj. Since 
M-SBL, the mixed norm approach, as well as SPL do not provide an exact /c-sparse solution, we used the 
support for the largest k coefficients as a support estimate in calculating the perfect recovery ratio. For 
MUSIC, S-OMP, SA-MUSIC, sequential CS-MUSIC, we assume that k is known. For suhspace based 
algorithms such as MUSIC, SA-MUSIC, sequential CS-MUSIC as well as SPL, we determine the signal 
suhspace using the following criterion 


max 




,m} (7i CTyyi 


> 0 . 1 , 


where ui > (T 2 > • • • > CTm denotes the singular values of YY*. A theoretical motivation for such 
suhspace determination is given in ||6l. Here, the success rates were averaged over 1000 experiments. The 
simulation parameters were as follows: m G {1,2,..., 50}, n = 128, k = 8,r = 5, SNR = 30dB, lOdB, 
and N € {32,128}, respectively. Figs. [2{a)-(d) illustrates the comparison results under various snapshot 
number and SNR conditions. Note that SPL consistently outperforms all other algorithms at various 
snapshots numbers. In particular, the gain increases with increasing number of snapshots, since it provides 
better subspace estimation. Also, note that SPL consistently outperforms M-SBL at all SNR ranges. 
Figs.|2}e)(f) illustrates that SPL significantly outperforms M-SBL when X is badly conditioned. Moreover, 
as the subspace estimation becomes accurate with increasing N, the performance gain becomes more 
significant. 

Figs. [3}a)(b)(c) compares the performance of various MMV algorithm by varying the sparsity 
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(e) (f) 


Fig. 2. Performance of various joint sparse recovery algorithms at n = 128, fc = 10, r = 6 when (a) SNR = 30dB,N = 
16, r = 1, (b) SNR = 30dB, N = 256, r = 1, (c) SNR = lOdB, = 16, r = 1, (d) SNR = lOdB, N = 256, t = 1, (e) 
SNR = 30dB, N = 16, r = 0.1, (f) SNR — 30dB, N = 256, r = 0.1, respectively. 
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(a) 


(b) 


(c) 


Fig. 3. Performance of various joint sparse recovery algorithms for varying sparsity level at = 256. The simulation parameters 
are (a) m = 40, r = 5, r = 1 and SNR=30dB, and (b) m = 40, r = 12, r = 1 and SNR=10dB, and (c) m = 40, r = 15, r = 0.5 
and SNR=30dB, respectively. 


level. Here, m and rank(y) are fixed and the sparsity levels changes, and we calculated the perfect 
reconstruction ratio. Again, SPL outperforms all existing methods for various SNR and conditions 
numbers. 

C. Fourier Measurements Cases 

Fig. |4] illustrates the results of the comparison when the measurement are from Fourier sensing matrix. 
Similar to Gaussian sensing matrix, consistent improvement of SPL over M-SBL and other algorithms 
under various conditions have been observed. 

VI. CONCLUSION 

In this paper, we derived a new MMV algorithm called subspace penalized sparse learning (SPL) to 
address a joint sparse recovery problem, in which the unknown signals share a common non-zero support. 
The SPL algorithm was inspired by the observation that the logdet(-) term in M-SBL is a rank proxy 
for a partial sensing matrix, and similar rank criteria exist in subspace-based greedy MMV algorithms 
like CS-MUSIC and SA-MUSIC. Furthermore, we proved that instead of rank(Ar^/^), minimizing 
rank(Q*Ar^/^) is a more direct way of imposing joint sparsity since its global minimizer can provide 
the true joint support. To impose such a subspace constraint as a penalty, the SPL algorithm employs 
the Schatten-p quasi norm rank penalty and was implemented as an alternating minimisation method. 
Theoretical analysis showed that as p —0, the global minimizer of the SPL is equivalent to the global 
minimiser of the Iq MMV solution. We further demonstrated that compared to M-SBL, our SPL is 
















20 





Fig. 4. Performance of various joint sparse recovery algorithms at n = 128 when the sensing matrix is from Fourier matrix 
and SNR=30dB. The simulation parameters are (a) r = 8, N = 16, r = 1, fc = 10, (b) r = 8, = 256, r = 1, fe = 10, (c) 
r = 5,m — 40, N = 256, r = 1, (d)r = 15, m = 40, N = 256, r = 0.5, respectively. 


more robust to recovering badly conditioned X*. With numerical simulations, we demonstrated that SPL 
consistently outperforms all existing state-of-the art algorithms including M-SBL. 

Acknowledgements 

This work was supported by Korea Science and Engineering Foundation under Grant NRF- 
2014R1A2A1A11052491. 


References 

[1] J. Kim, O. Lee, and J. Ye, “Compressive MUSIC: revisiting the link between compressive sensing and array signal 
processing,” IEEE Trans, on Information Theory, vol. 58, no. 1, pp. 278-301, 2012. 























21 


[2] J. Chen and X. Huo, “Theoretical results on sparse representations of multiple measurement vectors,” IEEE Trans, on 
Signal Processing, vol. 54, no. 12, pp. 4634^643, 2006. 

[3] S. Cotter, B. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse solutions to linear inverse problems with multiple measurement 
vectors,” IEEE Trans, on Signal Processing, vol. 53, no. 7, p. 2477, 2005. 

[4] M. Mishali and Y. C. Eldar, “Reduce and boost: Recovering arbitrary sets of jointly sparse vectors,” IEEE Trans, on Signal 
Processing, vol. 56, pp. 4692^702, 2009. 

[5] E. Berg and M. R Friedlander, “Theoretical and empirical results for recovery from multiple measurements,” IEEE Trans, 
on Information Theory, vol. 56, no. 5, pp. 2516-2527, 2010. 

[6] K. Lee, Y. Bresler, and M. Junge, “Subspace methods for joint sparse recovery,” IEEE Trans, on Information Theory, 
vol. 58, no. 6, pp. 3613-3641, 2012. 

[7] P. Feng, “Universal minimum-rate sampling and spectrum-blind reconstruction for multiband signals,” Ph.D. Dissertation, 
University of Illinois, Urbana-Champaign, 1997. 

[8] J. Tropp, A. Gilbert, and M. Strauss, “Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit,” Signal 
Processing, vol. 86, no. 3, pp. 572-588, 2006. 

[9] Y. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals: Uncertainty relations and efficient recovery,” IEEE Trans, 
on Signal Processing, vol. 58, pp. 3042-3054, 2010. 

[10] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based compressive sensing,” IEEE Trans, on Information Theory, 
vol. 56, no. 4, pp. 1982-2001, 2010. 

[11] G. Obozinski, M. Wainwright, and M. Jordan, “Support union recovery in high-dimensional multivariate regression,” The 
Annals of Statistics, vol. 39, no. 1, pp. 1^7, 2011. 

[12] D. Wipf and B. Rao, “An empirical Bayesian strategy for solving the simultaneous sparse approximation problem,” IEEE 
Trans, on Signal Processing, vol. 55, no. 7 Part 2, pp. 3704-3716, 2007. 

[13] J. M. Kim, O. K. Lee, and J. C. Ye, “Improving noise robustness in subspace-based joint sparse recovery,” IEEE Trans, 
on Signal Processing (in press), 2012. 

[14] D. Wipf, B. Rao, and S. Nagarajan, “Latent variable bayesian models for promoting sparsity,” IEEE Trans, on Information 
Theory, vol. 57, no. 9, p. 6236, 2011. 

[15] D. Wipf, “Bayesian methods for finding sparse representations,” Ph.D. dissertation. University of California, San Diego, 
2006. 

[16] E. K. P. Chong and S. H. Zak, An Introduction to Optimization. New York: Wiley-Interscience, 1996. 

[17] K. Mohan and M. Fazel, “Iterative reweighted least squares for matrix rank minimization,” in 48th IEEE Annual Allerton 
Conference on Communication, Control, and Computing, 2010, pp. 653-661. 

[18] M. Fazel, H. Hindi, and S. Boyd, “Log-det heuristic for matrix rank minimization with applications to hankel and euclidean 
distance matrices,” m American Control Conference, 2003. Proceedings of the 2003, vol. 3. IEEE, 2003, pp. 2156-2162. 

[19] K. Petersen and M. Pedersen, “The matrix cookbook,” Technical University of Denmark, pp. 7-15, 2008. 

[20] P. Tao and L. An, “Convex analysis approach to dc programming: Theory, algorithms and applications,” Acta Mathematica 
Vietnamica, vol. 22, no. 1, pp. 289-355, 1997. 

[21] P. Tao and L. T. H. An, “A DC optimization algorithm for solving the trust-region subproblem,” SIAM Journal on 
Optimization, vol. 8, no. 2, pp. 476-505, 1998. 

[22] E. Van Den Berg and M. Friedlander, “Probing the pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific 
Computing, vol. 31, no. 2, p. 890, 2008. 



